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SUMMARY 


This  Memorandum  describes  how  Fisher's  Linear  Discriminant  can 
be  combined  with  the  Fukunaga-Koontz  transform  to  give  a  useful 
technique  for  reduction  of  feature  space  from  many  to  two  or  three 
dimensions*  Performance  is  seen  to  be  superior  in  general  to  the 
Foley-Sammon  extension  to  Fisher's  method.  The  technique  is  then 
extended  to  show  how  a  new  radius  vector  (or  pair  of  radius  vectors) 
can  be  combined  with  Fisher's  vector  to  produce  a  classifier  with 
even  more  power  of  discrimination.  Illustrations  of  the  technique 
show  that  good  discrimination  can  be  obtained  even  if  there  is 
considerable  overlap  of  classes  in  any  single  projection. 

Index  Terms  Dimensionality  reduction,  discriminant  vectors, 
feature  selection,  Fisher  criterion,  linear  transformations, 
separability. 
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1  INTRODUCTION 

Fisher's  linear  discriminant  function^ ^ »2)  makes  a  useful  classifier  where 
the  two  classes  have  features  with  well  separated  means  compared  with  their 
scatter.  The  method  finds  that  vector  which,  when  training  data  is  projected 
on  to  it,  maximises  the  class  separation.  It  is  a  many  to  one  linear  trans¬ 
formation. 

At  the  other  extreme,  for  the  case  in  which  both  classes  have  the  same 
mean  but  different  variances  Fukanaga  and  Koontz  have  descrlbed^^^  a  transform 
which  maximises  class  discrimination.  This  method  transforms  the  data  so  the 
joint  (sum)  covariance  matrix  for  the  two  classes  is  the  identity  matrix  and 
then  selects  the  eigenvector  with  the  largest  difference  in  eigenvalues  for 
the  two  classes.  The  data  is  then  projected  on  to  this  eigenvector  to  produce 
the  classifier. 


An  advantage  of  linear  projections  such  as  these  Is  that  they  can  give 
the  system  designer  some  appreciation  of  class  separation  If  training  data 
Is  presented  as  histograms  of  points  projected  on  to  the  discriminant  vector. 

An  even  better  appreciation  Is  gained  If  training  data  Is  presented  as  a  two- 
dimensional  scatter  diagram  where  the  dimensions  are  chosen  to  be  those  which 
show  the  best.  In  some  sense,  classification.  Such  projections  on  to  two 
dimensions  also  allows  the  operator  to  select  curved  or  piecewise  linear 
decision  boundaries  as  a  pattern  classifier.  This  Interactive  approach  to 
classifier  design  has  been  found  to  yield  more  productive  results  than  the 
use  of  non-lnteractlve  methods  such  as  the  quadratic  discriminant  function 
especially  where  data  Is  non-Gausslan. 

The  quadratic  discriminant  method  operates  thus:  first  the  training  data 
Is  used  to  estimate  the  class  means  and  covariance  matrices.  These  are  then 
used  to  generate  multivariate  Gaussian  probability  distribution  functions  for 
the  two  classes.  The  decision  as  to  which  class  a  new  data  point  belongs  Is 
taken  by  evaluating  the  probability  functions  at  that  point  and  the  new  point 
is  assigned  to  that  class  with  the  greatest  probability.  If  the  data  Is  truly 
Gaussian  and  If  enough  training  points  are  available  to  estimate  the  means  and 
covariances  accurately  this  method  produces  the  optimum  Baysian  classifier. 

The  decision  boundary  consists  of  those  points  where  the  pfd's  are  equal  and 
this  will  In  general  be  a  multidimensional  quadratic  surface  In  feature  space. 

In  practice  training  data  is  often  non-Gausslan  and  the  interactive 
approach  using  projections  of  the  data  Is  preferred.  The  problem  is  that  there 
Is  often  a  huge  number  of  combinations  of  pairs  of  features  which  can  be 
examined  and  a  methodology  Is  needed  which  standardises  the  data  and  points  to 
possible  two-dimensional  projections  where  discrimination  may  be  high.  Ideally 
the  method  should  also  project  the  Baysian  decision  boundary  into  a  unique  line 
in  the  two-dimensional  subspace.  This  will  then  maintain  the  performance  of 
the  classifier  to  the  Baysian  rate  If  the  data  should  happen  to  be  Gaussian. 

Foley  and  Samraon^^)  have  suggested  an  extension  to  Fisher's  method  which 
gives  a  two-dimensional  (or  more)  projection  for  displaying  data.  Their  method 
Is  based  on  finding  Fisher's  vector  first;  the  data  Is  then  projected  on  to  the 
subspace  normal  to  Fisher's  vector  and  the  process  of  finding  Fisher's  vector 
In  that  subspace  Is  repeated.  The  data  Is  then  displayed  projected  on  to  the 
plane  subtended  by  these  two  vectors  and  a  decision  boundary  Is  constructed  In 
that  plane.  It  Is  shown  later  in  this  Memorandum  that  this  method  Is  of  doubt¬ 
ful  value  for  finding  the  best  classification  subspace. 

The  methods  proposed  In  this  Memorandum  are  based  on  applying  a  standardi¬ 
sing  transform  to  the  training  data.  This  then  allows  Fisher's  method  to  be 
used  In  conjunction  with  Fukunaga's  method  to  select  the  best  two-dimensional 
linear  projection.  The  method  Is  then  extended  to  show  how  a  nonlinear  combin¬ 
ation  of  features  can  result  In  a  two  or  three  dimensional  scatter  diagram  with 
a  performance  which  Is  round  to  be  better  than  the  linear  method  In  a  number  of 
cases.  The  method  further  allows  the  Baysian  decision  surface  to  be  uniquely 
represented  by  a  line  In  the  subspace  for  multlvarlance  Gaussian  data  with 
certain  conditions. 

2  OBSERVATIONS  ON  FISHER'S  METHOD 

Fisher's  method  finds  the  vector  ^  which  gives  greatest  (as  defined  by  a 
criterion  function)  class  separation  to  data  points  projected  on  to  the  vector. 
The  criterion  function  Is: 


2 


where  ^  ~  class  mean  for  class  1,  1  ■■  a»  b 
-  covariance  matrix  for  class  1. 

The  vector  solution  to  this  maximising  problem  can  be  shown  to  be: 

F  -  [W^  f 

It  should  be  mentioned  that  maximising  this  criterion  function  does  not  neces¬ 
sarily  produce  the  best  projection  for  classification  as  shown  by  Mallna^^). 
However  the  differences  are  usually  very  small  and  Fisher's  method  Is  used 
In  this  Memorandum  because  It  leads  to  the  Interesting  generalisations  and 
extensions  shown  here. 

It  Is  clear  that  a  data  set  can  be  transformed  on  to  a  new  set  of  co¬ 
ordinates  without  loss  or  gain  of  discriminating  performance  provided  the 
transform  Is  unique  (ie  Invertible).  A  decision  boundary  In  one  co-ordinate 
system  maps  on  to  the  other  with  the  same  number  of  true  of  false  classifica¬ 
tions  on  either  side.  Now  the  decision  threshold  on  the  Fisher  axis  corres¬ 
ponds  to  a  hyperplane  decision  surface  In  feature  space,  where  the  hyperplane 
Is  normal  to  the  Fisher  axis  and  Intersecting  It  at  the  decision  threshold. 

It  is  shown  In  Appendix  A  that  If  a  linear  transformation  Is  applied  to  the 
data  the  same  decision  threshold  Is  generated  If  the  Fisher  vector  is  found 
either  before  or  after  the  transformation. 

An  Interesting  transform  which  can  be  applied  to  a  training  data  set  is 
that  which  causes  the  joint  scatter  matrix  for  the  training  data  to  become  an 
Identity  matrix.  Such  a  transform  can  be  visualised  by  first  applying  a  rota¬ 
tion  of  axes  (orthogonal  transform)  so  that  the  eigenvectors  of  the  scatter 
matrix  are  the  orthogonal  co-ordinate  system  (the  Karhunen-Loeve  transform). 
Each  co-ordinate  can  then  be  scaled  so  that  the  variances  are  unity  thus  giving 
a  unity  scatter  matrix. 

In  this  new  co-ordinate  system  the  Fisher  axis  is 
r  -  1  (ju  -  jfc)  • 

That  Is,  F'  Is  parallel  to  the  axis  intercepting  the  means  of  the  two  distribu¬ 
tions,  see  Figure  1  and  2.  This  appears  to  be  a  useful  way  of  standardising 
the  use  of  Fisher's  method  and  there  is  no  loss  or  gain  in  performance  of  the 
transformed  training  data  compared  with  the  method  applied  to  the  untransformed 
data. 

It  Is  also  evident  that  If  we  apply  this  standardising  transform  and  then 
project  the  data  on  to  the  hyperplane  normal  to  the  new  Fisher  vector  then  the 
two  distributions  obtained  will  have  coincident  means  (see  Figure  3). 


3  EXTENSI^'JS  TO  FISHER'S  METHOD 


As  mentioned  earlier  It  would  be  convenient  If  we  could  combine  Fisher's 
vector  with  some  other  discriminant  function  to  give  a  two-dimensional  vector 
representation  -  simply  because  It  Is  easy  to  plot  two-dimensional  scatter 
diagrams,  and  also  two  dimensions  should.  In  general,  give  better  discrimina¬ 
tion  than  one  dimension. 

So  on  what  basis  do  we  select  another  dimension?  If  the  standardising 
transform  Is  applied  first  then,  as  we  have  seen,  the  clusters  In  the  subspace 
normal  to  Fisher's  axis  will  have  coincident  means.  This  makes  the  task  of 
finding  a  second  Fisher  axis  Impossible.  It  Is  this  fact  that  makes  this 
method  rather  suspect  and  It  was  recognition  of  this  which  lead  to  the  Identi¬ 
fication  of  the  generalisations  described  In  this  Memorandum.  It  Is  believed 
that  these  new  methods  do  Improve  discrimination  and  Indeed  Illustrative 
examples  are  given  to  show  the  Improvements  which  can  be  obtained  when  the 
methods  are  used  Instead  of  Foley-Sammon. 

4  FISHER  WITH  FUKUNAGA-KOONTZ 

In  all  the  methods  described  from  here  onwards  the  first  step  Is  to  trans¬ 
form  the  data  to  give  an  Identity  joint  covariance  matrix  (the  standardising 
transform).  The  Fisher  axis  is  then  the  axis  through  the  means.  If  the  data 
Is  projected  on  to  the  hyperplane  perpendicular  to  the  Fisher  axis  then  the 
means  will  be  coincident.  To  obtain  maximum  difference  between  the  two  classes 
we  can  look  for  the  projection  which  maximises  the  differences  In  variances 
(normalised  by  the  sum  of  the  variances).  It  is  then  evident  from  Figure  4 
that  the  bigger  the  difference  the  better  the  classifier. 

If  the  two  classes  have  covariances  and  Let  T  be  the  standardising 

transform  such  that: 


T  (Wg  +  Wb)  t"' 

Fukunaga  showed  that  the  eigenvectors  of 

T  Wjj  T"^  and  T  T*^ 

are  the  same  and  that  all  eigenvalues  are  bounded  by  0  and  1  and  that  the  sum 
of  any  pair  equals  1,  le  ^Ib  “  !• 

It  Is  clear  from  this  that  the  axis  which  gives  the  biggest  difference  In 
variances  for  the  two  classes  Is  the  eigenvector  with  the  biggest  difference 
In  eigenvalue  for  the  two  classes. 

Thus  the  Fisher  projection  with  the  Fukunaga-Kootz  (F-K)  projection  gives 
a  many-to-one  transform  with  a  performance  usually  better  and  never  worse  than 
the  Fisher  with  the  Foley-Sammon  (F-S)  projection  (for  multivariate  Gaussian 
data).  Figures  6  and  7  Illustrate  an  example  where  the  two  classes  have 
different  means  and  where  the  F-S  and  F-K  vectors  are  different.  Figure  5 
shows  the  parameters  used  to  generate  the  test  data. 

Figure  6  shows  the  scatter  diagram  for  F-S  with  100  points  for  each  class 
to  train  and  test.  Figure  7  shows  the  same  data  with  Fisher  and  F-K  indicating 
a  clear  Improvement. 
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FISHER  WITH  A  RADIUS  VECTOR 


This  use  of  the  F-K  transform  suggests  an  even  more  powerful  (nonlinear) 
oany-to-one  transform.  Consider  a  three  feature  two  class  problem.  After  the 
standardising  transform  and  projection  on  to  the  plane  normal  to  the  Fisher 
axis  we  might  obtain  distributions  as  shown  In  Figure  3.  If  all  the  eigen¬ 
values  for  Class  A  are  less  than  those  for  Class  B  then  the  Bayslan  decision 
surface  can  be  shown  to  be  an  ellipse  with  the  eigenvectors  as  axes  (see 
Appendix  B). 

By  rotating  the  data  and  rescaling  It,  It  Is  clear  that  the  Bayslan  sur¬ 
face  can  be  made  Into  a  circle  with  Class  A  Inside  and  Class  B  outside.  The 
only  Information  needed  to  test  If  a  new  data  point  lies  Inside  or  outside  Is 
to  compute  Its  radius  from  the  common  mean  and  test  against  the  radius  to  the 
Bayslan  threshold.  In  the  more  general  multidimensional  case  the  Bayslan  sur¬ 
face  can  be  made  Into  one  hypersphere  If  the  eigenvalues  of  one  class  are  all 
less  than  those  of  the  other:  classification  In  this  case  Involves  assigning 
a  new  point  to  Class  A  or  B  according  to  whether  It  lies  inside  (class  with 
smaller  eigenvalues)  or  outside  the  hypersphere  (class  with  larger  eigenvalues). 

For  either  of  these  cases  the  data  set  can  be  mapped  from  the  original 
multidimensional  feature  space  down  to  two  where  distance  along  the  Fisher 
axis  is  one  feature  and  radius  (Euclidean  distance)  from  the  Fisher  axis  (in 
the  transformed  space)  is  the  other  feature. 

This  method  will  be  better  than  the  use  of  Fisher  with  one  F-K  axis  alone. 
In  circumstances  where  the  class  distribution  functions  have  circular  symmetry 
along  the  Fisher  axis  the  Bayslan  surface  will  also  have  circular  symmetry  and 
map  on  to  a  unique  line  In  the  Radlus-Flsher  plane.  Hence  In  this  case  perfor¬ 
mance  of  the  F-R  space  is  optimal  and  equal  to  performance  of  a  Bayslan  clas¬ 
sifier  In  full  feature  space  (for  Gaussian  data). 

To  use  this  method  in  the  more  general  case  where  not  all  eigenvalues  of 
one  class  are  less  than  those  of  the  other  we  divide  the  training  data  into 
two  subsets  of  reduced  dimensionality  where  the  first  subspace  only  contains 
features  where  and  the  second  only  those  features  where  A^j,  <  A^g. 

Any  features  for  which  should  not  be  included  In  the  radius  calculation. 

Figure  11  shows  a  schematic  diagram  of  the  classifier  using  this  method.  If 
a  two-dimensional  classifier  is  required  the  subset  with  the  best  performance 
can  be  selected,  otherwise  a  three-dimensional  classifier  can  be  constructed. 

Figure  8  shows  the  scatter  In  the  F-R  axis  using  the  same  data  as  used  for 
the  Fisher  F-S  method  (Figure  5).  It  is  seen  that  the  error  rate  reduces  from 
6Z  with  F-K  to  2%  with  the  radius  for  this  example.  The  advantage  of  using  the 
radius  vector  can  be  seen  even  more  clearly  when  more  features  are  available  as 
In  the  ten-dimensional  example  of  Figure  9.  Notice  that  there  is  no  Increase 
In  mean  differences  in  variances.  The  scatter  diagram  obtained  using  the 
Fisher-Radius  method  Indicates  an  average  error  rate  approaching  zero.  If 
the  linear  methods  are  applied  they  would  show  no  Improvement  over  the  five- 
dlmenslonal  case  because  no  use  is  made  of  the  additional  features.  All  the 
data  shown  here  was  generated  to  give  multivariate  Gaussian  statistics. 

The  quadratic  discriminant  function  gives  optimum  performance  if  data  is 
multivariate  Gaussian  but  the  additional  scope  given  by  the  procedures  described 
hence  allow  a  better  performance  to  be  obtained  If  data  Is  highly  non-Gaussian. 
For  Instance  we  found  that  the  data  generated  with  a  negative  exponential 
distribution  was  better  classified  using  the  Fisher  radius  vector  than  using 
the  quadratic  discriminant  function. 
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CONCLUSIONS 


If  data  Is  standardised  with  a  linear  transform  to  give  a  unit  joint 
covariance  matrix  the  Foley-Sanunon  axis  becomes  meaningless  because  the  two 
classes  have  coincident  means  In  the  hyperplane  normal  to  the  Fisher  axis. 

In  this  case  the  Fukunaga-Koontz  transform  allows  the  next  best  feature  to 
be  selected. 

Another  simple  linear  transform  can  give  a  spherical  Bayslan  decision 
surface  on  features  where  all  eigenvalues  are  smaller  for  one  of  the  classes 
(see  Appendix  B).  In  this  case  distance  along  the  Fisher  axis  and  radius 
from  the  Fisher  axis  form  a  powerful  discriminating  function.  If  all  eigen¬ 
values  are  not  smaller  for  one  class  then  the  features  can  be  divided  Into 
two  groups  and  two  radii  calculated  with  the  best  or  both  being  used  with 
Fisher  distance  to  provide  the  discriminating  function.  Both  linear  trans¬ 
forms  can  be  combined  Into  a  single  operation.  Figure  11  shows  how  simple 
the  Implementation  of  this  classifier  would  be. 

The  arguments  used  In  this  paper  apply  to  multivariate  Gaussian  distri¬ 
butions.  In  practice  distributions  are  not  so  simple.  However  we  believe 
that  data  can  be  standardised  and  Inspected  using  the  procedures  described 
here  as  a  first  approach  to  the  classifier  design  problem.  A  good  classifier 
may  result,  perhaps  with  some  exercising  of  pathological  features  or  with  the 
Inclusion  of  special  stages  to  include  highly  non-Gaussian  but  well  discrimin¬ 
ating  features. 

The  methods  described  here  are  for  two-class  problems  only.  However  they 
are  particularly  suited  to  the  technique  of  reducing  a  many-class  problem  to 
that  of  many  two-class  problems. 

In  this  form  the  problem  is  to  identify  one  species  against  the  world 
background  of  other  sped'"  5.  This  usually  results  In  the  world  background 
class  having  a  larger  variance  compared  with  the  required  species  and  our 
method  takes  advantage  of  this  characteristic  and  can  give  good  performance 
even  when  the  two  class  means  are  similar. 
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APPENDIX  A  INVARIANCE  OF  FISHER'S  METHOD 


To  show  that  the  decision  surface  produced  by  the  Fisher  linear  discriminant 
function  is  invariant  under  a  transform  T  where  ^  ■  T  jc  is  a  non-singular  linear 
transform. 

Suppose  we  have  a  data  set  _x  (where  x  is  an  L-dimensional  vector) .  The 
mean  and  covariance  of  x  are  then  defined  as 

U  “  ECjc) 

W  =  E(xx'^) 

The  Fisher  discriminant  vector  ^  for  the  two  classes  is  then  found  from(^) 

F  -  (W^  +  Wj^)-!  (y^-  Ufe)  (Al) 

Suppose  data  is  now  transformed  by  T  such  that  X  "  where  T  is  a  non- 
singular  linear  transform.  Then  the  covariance  matrix  of  the  transformed  data 
is: 


W  -  E(Tx  xT  tT)  -  T  W  T^ 

Let  the  vector  ^  be  normal  to  the  Fisher  vector  in  the  untransformed 
space,  ie 


P'^F  -  0  (A2) 

The  Fisher  vector  F^'  in  the  transformed  space  is  obtained  from: 

r  -  (w;  +  wp-1  -  _y^) 

-  (TWj"^  +  TWjjT'^)”^  T(jj^  -  Ujj) 

-  (T  (Wg  +  T(j^  - 

-  (t'^)“1  (W^  +  T'^  TCm^  -  ji^,) 

-  (tV^  (Wg  +  (Jig  -  jjj,)  (A3) 

We  can  show  that  any  plane  normal  to  the  Fisher  discriminant  vector  in 
untransformed  space  will  be  normal  to  the  new  Fisher  vector  in  transformed 
space  if  we  can  show 
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APPENDIX  B  HYPERSPHERICAL  DECISION  BOUNDARIES 


This  Appendix  shows  how  and  when  the  Baysian  decision  surface  on  multi¬ 
variate  Gaussian  distributions  with  common  means  can  be  transformed  to  a 
hypersphere. 


The  Fukunaga-Koontz  transform  shows  how  two  distributions  can  be  trans¬ 
formed  to  have  common  eigenvectors.  The  K-L  transform  extracted  from  the 
covariance  matrix  of  one  class  can  be  used  to  align  the  eigenvectors  with  a 
new  co-ordinate  system.  In  this  system  the  two  data  sets  are  decorrelated 
and  as  they  have  common  means  the  pdf's  can  be  written  as: 


Class  A 

P(a) 
Class  B 


^  ( - TT? - T  -  hPl/lX.  ) 

i=i\ 

la 


P(b)  = 


L/  1  2  \ 

^  I m — I  I 


(B1) 


(B2) 


At  the  Baysian  decision  boundary  the  pdf's  are  equal.  Taking  logs  of 
equations  (Bl)  and  (B2)  and  equating  gives: 


or 


i=1 


x?/2X.  + 

1  la 


L 

log  n 

i=1 


Z 


x^/2X.,  + 
1  lb 


L 

log  n 

i=1 


Zxf  (t/2Xj^-  l/2X.^)  -Hog  n 
1-1  1=1  \  la 


(B3) 


If  all  the  coefficients  of  x  have  the  same  sign  this  is  the  equation  of 
a  hyperellipse.  A  simple  rescaling  transform  can  be  used  to  reduce  this  to 
a  hypersphere  with  the  scaling  factor  in  the  i^b  co-ordinate  being  given  by 


and  the  actual  transform  being 


r*^i 


1^2 


0 


0 


If  the  coefficients  of  x  in  equation  (3)  have  different  signs  some  sections 
of  the  surface  will  be  a  saddle.  To  overcome  this  the  data  can  be  transformed 
into  two  subsets  of  similarly  signed  coefficients.  If  the  variances  of  two 
features  are  equal 


^la  “  ^b 


then  the  i^^  feature  will  give  no  further  discrimination  and  can  be  disregarded. 
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Fig  1  Fishers  Linear  Discriminant  Function 
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Fig  2  Fishers  Linear  Discriminant  Function 
after  applying  transform  to  make 
(Wa  +  Wb)  =  (l)&Ma  =  0 
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Fig  3  Projection  of  Data  onto  subspace 
normal  to  F 
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Classification  of  data  with  common  mean  and 

(a)  big  difference  in  variances 

(b)  small  difference  in  variances 


Fig  5  Some  projections  of  the  data  used  in  Fig  6-8  with 


Fig  6  Scatter  in  Fisher  Foley-Sammon  plane 
Total  errors  ~  12% 


Fig  7  Scatter  in  Fisher  Fukanaga-Koontz  plane 
Total  errors  ~  6.5% 


Fig  8  Scatter  in  Fisher  Radius  plane 
Total  errors  ~  2% 


Fig  9  Scatter  in  Fisher  Radius  Plane, 

10  dimension  data  generated  with 
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Abstract 

This  Memorandum  describes  how  Fisher's  Linear  Discriminant  can  be  combined  with 
the  Fukunaga-Koontz  transform  to  give  a  useful  technique  for  reduction  of  feature 
space  from  many  to  two  or  three  dimensions.  Performance  is  seen  to  be  superior 
in  general  to  the  Foley-Sammon  extension  to  Fisher's  method.  The  technique  is 
the  extended  to  show  how  a  new  radius  vector  (or  pair  of  radius  vectors)  can  be 
combined  with  Fisher's  vector  to  produce  a  classifier  with  even  more  power  of 
discrimination.  Illustrations  of  the  technique  show  that  good  discrimination 
can  be  obtained  even  if  there  is  considerable  overlap  of  classes  in  any  single 
projection. 


